Character Segmentation of Sindhi, an Arabic Style Scripting Language, using Height Profile Vector

نویسندگان

  • Noor Ahmed Shaikh
  • Ghulam Ali Mallah
  • Zubair A. Shaikh
چکیده

In this paper, a problem of sub-word segmentation of printed Sindhi, an Arabic style scripting language, into characters is addressed. Printed or handwritten Sindhi text is cursive in nature. In the cursive writing, mostly the subsequent characters in a word are joined with each other. In the proposed segmentation algorithm, first of all, Height Profile Vector (HPV) of thinned primary stroke of a sub-word is calculated and analyzed for the segmentation into its constituent characters. The number and locations of possible segmentation points (PSP) are determined. The number of PSPs gives a rough estimation of the number of characters in the sub-word. The data around the last PSP is further analyzed to determine the exact number of characters in the sub-word. As the characters’ set of Sindhi is the superset set of Arabic characters’ set hence the proposed segmentation algorithm may be used for the segmentation of text written in other Arabic scripting languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Sindhi Related and Arabic Script Adapted languages Recognition

1. INTRODUCTION The character recognition of the Roman type of languages especially English has come near to perfection and it is also considered as one of the successful application in the field of computer vision. The work on Arabic script and other scripts is being continued on; but the languages adopting Arabic script is very little while the work on Sindhi language is near to its origin. T...

متن کامل

Word-level recognition of multifont Arabic text using a feature vector matching approach

Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition syste...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System

This paper presents a novel combinational phonetic algorithm for Sindhi Language, to be used in developing Sindhi Spell Checker which has yet not been developed prior to this work. The compound textual forms and glyphs of Sindhi language presents a substantial challenge for developing Sindhi spell checker system and generating similar suggestion list for misspelled words. In order to implement ...

متن کامل

A Proposed Hybrid Technique for Recognizing Arabic Characters

Optical character recognition systems improve human-machine interaction and are urgently required for many governmental and commercial departments. A considerable progress in the recognition techniques of Latin and Chinese characters has been achieved. By contrast, Arabic Optical Character Recognition (AOCR) is still lagging although the interest and research in this area is becoming more inten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012